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Abstract 

The present paper proposes generalization of the linear empirical Bayes estimation method 
which takes advantage of the flexibility of the wavelet techniques. We present an empirical 
Bayes estimator as a wavelet series expansion and estimate coefficients by minimizing the 
prior risk of the estimator. As a result, estimation of wavelet coefficients requires solution of 
a well-posed low-dimensional sparse system of linear equations. The dimension of the system 
depends on the size of wavelet support and smoothness of the Bayes estimator. An adaptive 
choice of the resolution level is carried out using Lepskii (1997) method. The method is 
computationally efficient and provides asymptotically optimal EB estimators posterior risks 
of which tend to zero at an optimal rate. The theory is supplemented by numerous examples. 
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1 Introduction 

Empirical Bayes (EB) methods are estimation techniques in which the prior distribution, in the 
standard Bayesian sense, is estimated from the data. They are powerful tools in particular when 
data are generated by repeated execution of the same type of experiment. The EB methods are 
directly related to the standard Bayes models but there is difference in perspective between the 
two in the sense that, in the standard Bayesian approach, the prior distribution is assumed to 
be fixed before any data are observed, whereas, in the EB setting, the prior distribution is, in 
some way or another, estimated from the observed data. 

Consider the following setting. One observes independent two-dimensional random vectors 
(Xi, 9\) , (X n ,9 n ), where each 6i is distributed according to some unknown prior pdf g and, 
given 6i = 8, observation Xi has the known conditional density function q(x\6). In each pair the 
first component is observable, but the second is not. After the (re + l)-th observation y = X n+ \ 
is taken, the goal is to estimate t = 8 n+ i. 

If the prior density g{6) were known, then the Bayes estimator of 9 n+ \ which delivers the 
minimal mean squared risk would be given by the following equation 

t(y) = TL^mw- (L1) 

Since the prior density g(9) is unknown, an EB estimator t(y; X\, X2, ■ ■ ■ , X n ) has to be used. 
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Using notations 



p{y) = / q(y\e)g(9)d9, 



(1.2) 




= / 9q(y\e) g {e)de 



(1.3) 



J —CO 



t{y) can be rewritten as 



t{y) = ^>{y)/p{y). 



(1.4) 



There is a variety of methods which allow to estimate t{y) on the basis of observations 
y;Xi,--- ,X n . After Robbins (1955, 1964) formulated EB estimation approach, many statisti- 
cians have been working on developing EB methods. The comprehensive list of references as 
well as numerous examples of applications of EB techniques can be found in Carlin and Louis 
(2000) or Maritz and Lwin (1989). 

The EB techniques can be divided into two groups: parametric and nonparametric. Para- 
metric EB methods require that the parametric form of a family to which the prior distribution 
belongs is specified a priori. The past data is then used to estimate the values of the unknown 
parameters, usually using the maximum likelihood approach. In nonparametric EB estimation, 
prior distribution is completely unspecified. 

One of the approaches to nonparametric EB estimation is based on estimation of the 
numerator and the denominator in the ratio in (|1.4[) . This approach was introduced by Rob- 
bins (1955, 1964) himself and later developed by a number of authors (see, e.g., Brown and 
Greenshtein, E. (2009), Datta (1991, 2000), Ma and Balakrishnan (2000), Nogami (1988), Pen- 
sky (1997a,b), Raykar and Zhao (2011), Singh (1976, 1979) and Walter and Hamedani (1991) 
among others). The method provides estimators with good convergence rates, however, it re- 
quires relatively tedious three-step procedure: estimation of the top and the bottom of the 
fraction and then the fraction itself. In particular, one of the approaches is to take advantage of 
the fact that, in the case of a one-parameter exponential family, the numerator can be expressed 
as the derivative of the denominator. Wavelets provide an opportunity to construct adaptive 
wavelet-based EB estimators with better computational properties in this framework (see, e.g., 
Huang (1997) and Pensky (1998, 2000, 2002)) but the necessity of estimation of the ratio in ([L41 
remains. Another nonparametric approach developed in Jiang and Zhang (2009), is based on 
application of nonparametric MLE technique which is computationally extremely demanding. 

In 1983, Robbins introduced a much more simple, local nonparametric EB method, linear 
EB estimation. Robbins (1983) suggested to approximate Bayes estimator t(y) locally by a 
linear function of y and to determine the coefficients of t{y) by minimizing the expected squared 
difference between t(y) and 8, with subsequent estimation of the coefficients on the basis of ob- 
servations X\ , X n . The technique is extremely efficient computationally and was immediately 
put to practical use, for instance, for prediction of the finite population mean (see, e.g., Ghosh 
and Meeden (1986), Ghosh and Lahiri (1987) and Karunamuni and Zhang, S. (2003)). 

However, a linear EB estimator has a large bias since, due to its very simple form, it has 
a limited ability to approximate the Bayes estimator t{y). For this reason, linear EB estimators 
are optimal only in the class of estimators linear in y. To overcome this defect, Pensky and Ni 
(2000) extended approach of Robbins (1983) to approximation of t(y) by algebraic polynomials. 
However, although the polynomial-based EB estimation provides significant improvement in the 
convergence rates in comparison with the linear EB estimator, the system of linear equations 
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resulting from the method is badly conditioned which leads to computational difficulties and 
loss of precision. 

To overcome those difficulties, Pensky and Alotaibi (2005) proposed to replace polynomial 
approximation of the Bayes estimator t(y) by its approximation via wavelets, in particular, by 
expansion over scaling functions at the resolution level m. The method exploits de-correlating 
property of wavelets and leads to a low-dimensional well-posed sparse system of linear equations. 
The paper also treated the issue of local asymptotic optimality as n — > oo: if the resolution level is 
selected correctly, in accordance with the smoothness of the Bayes estimator, then the suggested 
EB estimator attains optimal convergence rates. 

However, smoothness of the Bayes estimator t(y) is hard to assess. For this reason, the EB 
estimator of Pensky and Alotaibi (2005) is non-adaptive. One of the possible ways of achieving 
adaptivity would be to replace the linear scaling function based approximation by a traditional 
wavelet expansion with subsequent thresholding of wavelet coefficients. The deficiency of this 
approach, however, is that it yields the system of equations which is much less sparse and is 
growing in size with the number of observations n. 

The purpose of this paper is to provide an adaptive version of the wavelet EB estimator 
developed in Pensky and Alotaibi (2005). In particular, we preserve the structure of the linear 
structure of the estimator. However, since expansion over scaling functions at the resolution 
level m leads to excessive variance when resolution level m is too high and disproportionately 
large bias when m is too small, we choose the resolution level using Lepskii method introduced 
in Lepski (1991) and further developed in Lepskii, Mammen and Spokony (1997). The resulting 
estimator is adaptive and attains optimal convergence rates (within a lolgarithmic factor of n). In 
addition, it has an advantage of computational efficiency since it is based on the solution of low- 
dimensional sparse system of linear equations the matrix of which tends to a scalar multiple of 
an identity matrix as the scale m grows. The theory is supplemented by numerous examples that 
demonstrate how the estimator can be implemented for various types of distribution families. 

The rest of the paper is organized as follows. Section[2]introduces EB estimation algorithm. 
Section [3] assesses estimation error and describes the choice of the resolution level which delivers 
optimal convergence rates when the degree of smoothness of Bayes estimator t(y) is known. 
Section 2] discusses adaptive choice of the resolution level using Lepskii method and proves 
asymptotic optimality of the resulting EB estimator which is based on this choice. Section [5] 
provides examples of construction of EB estimators for a variety of distribution families. Section 
[6] concludes the paper with discussion. Finally, Section [7] contains the proofs of the statements 
in the paper. 

2 EB estimation algorithm 

In order to construct an estimator of t(y) defined in (|1.4p . choose a twice continuously differen- 
tiable scaling function with bounded support and s vanishing moments, so that 



suppy? G [Mi,M 2 ] 





x 



J , 0<j'<s-l. 



(2.2) 
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Approximate t(y) by a wavelet series 

tm{y) = ^2 am > k Prn,k(y) (2-3) 

kez 

where p m ,k{y) = 2 m ^ 2 ip(2 m y — k), and estimate coefficients of t m (y) by minimizing the integrated 
mean squared error 

2 



oo roo 



oo J —oo 



^ a m ,Wm,k{y) ~ z 
kez 



q(y\z)g(z)dz dy => min . (2-4) 



Taking derivatives of the last expression with respect to a m j and equating them to zero, we 
obtain the system of linear equations 



with 



B m 0, m — C m (2-5) 

/oo 

i Pm,k{ X ) l Prn,j{x)p{x)dx = E [<p m j.{X)ip m j(X)\ , (2.6) 
-oo 
roo 

(c m )j = Cj = / f m j(x)^(x)dx. (2.7) 



where we use the symbol E for expectation over the joint distribution of Xi, X2, • • • , X n . The 
expectations over all other distributions are represented in the integral forms. Also, in what 
follows, we suppress index m in notations of matrix B m = B and vector c m = c unless this leads 
to a confusion. 

System (|2,5p is an infinite system of equations. However, since we are interested in esti- 
mating t(x) locally at x = y, we shall keep only indices k,j G K m ^ y where 

K m , u = {k£ Z : 2 m y-M 2 - s(M 2 -M x )<k< 2 m y - M x + s(M 2 - Mi)} (2.8) 

where s is the number of vanishing moments of scaling function ip (see (12. 2D ). Observe that 
really expansion (|2.3p contains only coefficients a m & with 2 m y — M 2 < k < 2 m y — Mi, however, 
in order to ensure fast convergence of the bias of the estimator to zero, we need to keep more 
terms in the system of equations (|2.5p . 

The entries (|2.6[) of the matrix B are unknown and can be estimated by sample means 



Bj,k = n ^ zZ [Vm,k(Xi)pmA X i)\ ■ (2-9) 
1=1 

In order to estimate Cj, one needs to find functions u m j(x) such that for any 9 

/oo 
9q{x\9)(p m j{x)dx. (2-10) 
-00 

Then, multiplying both sides of (I2,10p by g{6) and integrating over 9, we obtain 

roo poo 

Eu mj (X) = / u mj (x)p(x)dx = / ip m:j (x)^(x)dx = cj. 
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Note that functions u m j{x) are the same functions which appear in the wavelet estimator 
of the numerator \&(y) of the EB estimator (jl.4p . therefore, the estimator considered herein can 
be constructed whenever wavelet EB estimation is possible (see, e.g., Pensky (1997a, 1998)). 
In Section [5] we show that solutions of equation (|2.10p can be easily obtained for a variety of 
distribution families. 

Once functions u m j(x) are derived, coefficients cj can be estimated by 

n 

c j =n- 1 Y,u m , ] (X i ) (2.11) 
1=1 

and system (|2.5p is replaced by Ba = c. However, though estimators B and c converge in 
mean squared sense to B and c, respectively, the estimator a = B^ 1 c may not even have finite 
expectation. To understand this fact, note that both B and c are asymptotically normal. In 
one dimensional case, the ratio of two normal random variables has Cauchy distribution and 
hence does not have finite mean. In multivariate case the difficulty remains. To ensure that the 
estimator of a has finite expectation, we choose 5 = 5 m > and construct an estimator of a of 
the form 

a s = (B + 5I)- 1 c (2.12) 

where I is the identity matrix. Observe that matrix B is nonnegative definite, so that (B + 51) 
is a positive definite matrix and, hence, is nonsingular. Solution as is used for construction of 
the EB estimator 

t m (y)= Yl ( 

(y). (2.13) 

k £ Km , y 

3 Estimation error and convergence rates 

3.1 The posterior and the prior risks 

An EB estimator t(y) can be characterized by its posterior risk 

/oo 
(i(y)-9) 2 q(y\9)g(9)d9 
-oo 

which can be partitioned into two components. The first component of this sum is 

/oo 
(%) - 9) 2 q (y\9)g(9)d9, 
-oo 

which is independent of t(y) and represents the posterior risk of the Bayes estimator (jl.ip . Thus, 
we shall measure precision of an EB estimator (12. 13H by the second component 

R n {y) = E(t m (y) - t{y)f (3.1) 

which represents the local error of the EB estimator at the point of observation y. 
It must be noted that often the precision of an EB estimator is described by 

/oo 
Rn(y)p(y)dy, 
-oo 
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which is the difference between the prior risk 



E / R{y-t{y))p(y)dy 

J — oo 

of the EB estimator t(y) and the prior risk 

/oo 
R(y;f(y))p(y)dy 
-oo 

of the corresponding Bayes estimator t(y). However, the risk function (13. ip has several advan- 
tages compared with KR n (y). 

First, R n (y) enables one to calculate the mean squared error for the given observation y 
which is the quantity of interest. Note that the wavelet series f|2. 13j) is local in a sense that 
coefficients (a$) mt k change whenever y changes, hence, working with a local measure of the 
risk makes much more sense. Using the prior risk for the estimator which is local in nature 
prevents one from seeing advantages of this estimator. Second, by using the risk function (|3.1[) 
we eliminate the influence on the risk function of the observations having very low probabilities. 
So, the use of R n {y) provides a way of getting EB estimators with better convergence rates. 
Third, posterior risk allows one to assess optimality of EB estimators for majority of familiar 
distribution families via comparison of the convergence rate of the estimator with the lower 
bounds for the risk derived in Pensky(1997). Finally, one can pursue evaluation of the prior 
risk for the estimator f|2. 13j) . The derivation will require assumptions similar to the ones in 
Pensky(1998) and can be accomplished by standard methods. 

The error (|3. 1 1) is dominated by the sum of two components 

Rn(y)<2(R 1 (y)+R 2 {y)) (3.2) 

where the first component R\ = R\(y) is due to replacement of the Bayes estimator t(y) by its 
wavelet representation (12 .3p . while R 2 = R 2 (y) is due to replacement of vector a = B~ 1 c by as 
given by PTTD 

Ri(y) = (t m (y)-t(y)) 2 , (3.3) 



R 2 (y) = E 



m,k Q , m,k\ l Pm,k 

(y) 



(3.4) 



We shall refer to R\ and R 2 as the systematic and the random error components, respectively. 
3.2 The systematic error component 

For evaluation of the systematic error component R\, let us introduce matrix Uh and vector 
with components 

oo 

h, 



{U h ) k ,i = / z h <p{z + 2 m y - k)cp(z + 2 m y - l)dz, (3.5) 

J — oo 
/•oo 

{D h ) k = / z h tp(z + 2 m y - k)dz. (3.6) 



oc 
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Observe that Uh and Dh are independent of unknown functions p(x) and ^f(x), and that Uq = I 
where I is the identity matrix. Denote 

n m ,y = {v- \x-y\ <2- m s{M 2 -M 1 )} . (3.7) 

Then 

/oo 
ip(2 m x - k)ip(2 m x - j)p(x)dx 
-oo 

/oo 
<p(z + 2 m y - k)<p(z + 2 m y - j)p(y + 2~ m z)dz 
-oo 

r 

= ^2- mft (/ i !)-V /l) (y)^ + o(2- mr ), 

Ji=0 

where Uq = I is the identity matrix. Deriving a similar representation for q., we obtain asymp- 
totic expansions of matrix B and vector c via matrices Uh and vectors -D^, respectively, as 
m — > oo 

r 

B = p(y)/ + ^2- m/l (W)-V /l )(y)^, + (2- mr ), (3.8) 

h=l 
r 

c = 2" m / 2 ^2- m \l\)- l ^ l \y)Di +o(2~ mr ). (3.9) 
z=o 

Formula (|3.8p establishes that, for large values of m, matrix B is close to p(y)I, so the system of 
equations (|2.5j) is well-conditioned. Therefore, the inverse matrix is close to p^ 1 (y)I, or, more 
precisely (see Pensky and Alotaibi (2005)), 

B- 1 = p-\y)I - 2- m p'(y)p- 2 (y)U 1 + o(2' m ). (3.10) 

Furthermore, if m — > oo, vector a in (|2.5p tends to 2 -m / 2 [*I'(y)/p(y)]Z)o where 

2 -m/2J2(D ) kVmjk (y) = l 
k 

for any y. The latter implies that the systematic error goes to zero as m — > oo at a rate o (2 _mr ). 
In particular, the following statement is valid. 

Lemma 1 Let functions p(x) and ^f(x) be r < s — 1 times continuously differentiate in the 
neighborhood Q y of y and let £l m ,y Q Then, for R± defined in $3.3\) . as m —> oo, 

Ri = (t m (y)-t(y)) 2 = o(2- 2mr ). (3.11) 

The proof of this lemma can be obtained by a slight modification of the proof of Lemma 3.1 
in Pensky and Alotaibi (2005). 
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3.3 The random error component 

In order to calculate the random error component R2 given by (|3.4p . introduce vectors 7^"' (m), j 
1,2, with components 



1/2 

00 



, keK m , y , i = 1,2,--- , (3.12) 

where u mj k(x) are defined in (I2,10p . and denote 

7 m = ||7 (1 V)|| (3.13) 

where ||z|| is the Eucledean norm of the vector z. The following expression provides an asymp- 
totic expression for the random error component as m,n —> 00. 

Lemma 2 Let 5 2 ~ n~ 1 2 m . Then, under the assumptions of Lemma 1, as m,n — )• oo, the 
random error component R2 defined in {3.J$ is such that 



R 2 = 0(2 m n- 1 (l + 1 D) , m,n^ 00, (3.14) 

provided 2 m n~ l — > and ||7^(m)|| 2 2 2m = o(n 3 ) as n ->■ 00. 

Proofs of this and other statements in the paper are given in Section [7] Proofs. 
Observe that the values of j^\m) are independent of the unknown density g{6) and can 
be calculated explicitly. In Section we bring examples of construction of functions u m ^{x) as 

well as the asymptotic expressions for 7P (m), j = 1, 2, for some common special cases (location 
parameter family, scale parameter family, one-parameter exponential family). In vast majority 
of situations, including the ones studied in Section 02 7 2 (m) is bounded above by the following 
expression 

7^ < C^2 am exp (b2^ , b, p > 0, C 7 > 0, (3.15) 

where a, b, (3 and C 7 are the absolute constants independent of m. 

It follows from Lemmas Q] and [2] that R n given in (|3.2p is minimized at the optimal 
resolution level m = ttlq where both errors are balanced 

m = argmin (n" 1 2 m ( 7 ^ + 1) + 2~ 2mr ) . (3.16) 

In particular, under assumption (|3.15p . as n — > 00, mo is such that 



2 ™o ~ J n 2 r+i+™(«.°), if 6 = 0, / 317 x 
\ ((26)- 1 logn) 1 //3, if 6 > 0. 

Here, a n X & n for two sequences, {a n } and {6 n }, n = 1, 2, • • • , of positive real numbers if there 
exist Ci and C2 independent of n such that < C\ < C2 < 00 and C\ < a n //3 n < C2. 
Then the following statement is true. 



Theorem 1 Let twice continuously differentiable scaling function satify \2.1\) and 112. 2\) . Let 
functions p{x) and *$>(x) be r £ [1/2, s — 1] times continuously differentiable in the neighborhood 
°f U such that £l m ,y - &>y where £l m ,y is defined in \3. 7| ). Choose too according to $3.16\) and 
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let 5 2 ~ n 1 2 m ° . Then, for any y such that p{y) > 0, as n — > oo, R n (y) defined in 113. 1\) satisfies 
the following asymptotic relation 

R n (y)=E(t m (y)-t(y)) 2 = 0(2~ 2mor ), m ^ oo, (3.18) 

provided 2 m n~ 1 — > and ||^y^ 2 ^ (m-o) || 2 2 2m, ° = o(n 3 ) as n — >■ oo. In particular, if assumption 
\3. 15\) holds, then, as n — > oo, one /ios 

O fn _2 ''+ 1 +™ ax ( Q . ) N ) , if b = 0, 
Rn(y) = { >, 2,n 7 (3.19) 

O ((logn) T) , if b>0. 



4 Adaptive choice of the resolution level using Lepskii method 

Note that when 6 > 0, the optimal resolution level rriQ is determined by the values of b and 
/3 which are completely known, so that the resulting EB estimator is adaptive, i.e., it attains 
the optimal convergence rate. However, if b = 0, the value of thq depends on the unknown 
smoothness of functions p(x) and *$>(x) which is unknown. In order to construct an adaptive 
estimator in this case, we shall apply Lepskii method for the optimal selection of the resolution 
level described in Lepski (1991) and Lepskii, Mammen and Spokony (1997). 
Let 7 m satisfy condition f|3. 15|) with 6 = 0. Denote 

/Omn = 2 m (l + 7m )n- 1 logn, (4.1) 

and let mi and m n be such that 

2 mi = log n, 2 m " ( 7mn + 1) x (log n)- 2 n. (4.2) 

Denote A4 = {m : mi < m < m n } and observe that under assumption f|3. 15j) with 6 = 0, m n is 
such that 

i 

I Tl \ l+ max ( Q >0) 

2 m n 



,(C 7 + l)logV 

so that, for mo given by (|3.17p . one has m n /mo — > oo as n — > oo. 

In this situation, the Lepski method suggests to choose resolution level as m = in with 

fh = min <|m G M. : \t m {y) - tj(y)\ 2 < A 2 ||^ m || 2 + \\B^\\ 2 2 p\ n for any j > m\ , (4.3) 

where A is a positive constant independent of m and n and, for any k, B^k = -Bfc + S^I where 
5 k = 2 k / 2 n- 1 ' 2 . 

Let ui and v<i be small positive values, vi + v<i < 1, and M be the size of vector c and 
matrix B. Denote = |y?(z — A:)| and 

k 

D = IbHoo + \m\oo\MooMVM\l - vi)- 1 ' 2 + lell^lUII^HLllpll^M 3 ^! - v 2 ]- 1 (4.4) 
and let A be given by equation 

\ = mC v \\p\\ 1 J Q 2 jMD + l (4.5) 
Then, the following statement is true. 
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Theorem 2 Let twice continuously differentiable scaling function satisfy \2.1\) and K2. 2\) . Let 
functions p{x) and ^>(x) be r > 1/2 times continuously differentiable in the neighborhood Q, y of 
y and let £l m ,y ^ where £l m ,y is defined in \3. Ity . Let j m satisfy inequality \3.15\) with 6 = 0. 
Construct EB estimator of the form \2.13]) and choose rh according to |^.3| ) with A defined in 
\4-5\ ) where D given by H4-4D - If> f or an V k S K m ^ y , 



(4.6) 



then 



as 



E|4(2/)-%)| 



O [n 2r+l + max( Q ,0) l ogn j . (4.7) 

In order to see how the method works, note that the mean squared error can be decomposed 

(4.8) 



A = E\t fh (y)-t(y)\ = A 1 + A 2 
where mo is the optimal resolution level defined in formula (|3.16|) and 



Ai 
A 2 



n\trn(y)-t(y)\ I(m<m )], 

2 . 



I 6-1 1|2 _i_ II ||2 



O(p mon ) 



E[|*fn(y) -t(y)\ I(m>m )]. 
If rh < mo, then, by definition of rh, one has 

|^m(y) ~~ t mo I < A p mon 

so that 

Ai < 2[E|? a (y)-f mo (y)| 2 +E|? mo (y)-t(y)| 2 ] = O {p 2 m J . 
Now, in the case when rh > mo, by definition (|4.3p of rh, there exists j > mo, such that 

2 



(4.9) 



(4.10) 



|t;(y)-i mo (y)| 2 > A 2 p- 



I f? - 1 ||2 I II ~d 1 ||2 
l^j II + H^moH 



It turns out that probability of such an event is very low. In particularly, the following lemma 
holds. 

Lemma 3 Let conditions of Theorem^ hold. If resolution level m is such that tuq < m < m n , 
then, as n — >• oo, 



t m (y)-t mo (y) | 2 > A 2 



i^— li|2 . II — 1 ||2 
I Srn II ' II Smo 1 1 



Pmn ) = 0(n 2 ) . 



(4.11) 



This lemma implies that A 2 = 0[n 1 ) = o (p mon ) as n — > oo and, hence, Theorem [2] is 
valid. The complete proof of Theorem [2] is provided in Section [7J 
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5 Examples of construction of EB estimators 
5.1 Location Parameter Family 

In the case when is a location parameter, q(x\6) = q(x — 0), EB estimator t(y) in (II. ip is of 
the form 

, M _ Sr oo {y-e)g{y-e)g{d)dB 

Hence, u m j(x) in this case is the solution to the following equation 

/EM 
(x — 9)q(x — 9)(p m j(x)dx. 
-EM 

Taking Fourier transforms of both sides, we obtain that 

Umj (x) = 2 m / 2 C/ m (2 m x-i) (5.1) 
where U m (x) is the inverse Fourier transform of 

U m (uj) = i[q{-2 m U )\- x q'(-2 m u) (5.2) 
Using Parseval identity and taking into account that lf m ^ has finite number of terms, we derive 

poo 

7 2 (m)x / \q{-2 m u)Y l q'{-2 m u)\ 2 \<p{u)\ 2 dLO. (5.3) 



Also, the following relation allows to check validity of condition (|4.6|) : 

||«mj(a:)||oo < 2 m/2 ||C/ m (x)|| 0O . (5.4) 
Below, we consider some special cases. 

Example 1 Normal Distribution Let q{x\9) be the pdf of the normal distribution 

1 (a-?) 2 

V27T(J 



where a > is known. Then g(w) = y/ne 2 and, using properties of Fourier transform, we 
derive 

U m (x) = -2 m a\'{x). 

Since tp'(x) is square integrable, it follows from (|5.3p and (|5.4j) that 7 2 (m) x 2 2m and ||t7 m ||oo x 
2 m . Hence, condition (|4.6p of Theorem [2] holds. Then, a = 2, 2 m ° ~ and 



E|4(y) -t(y)\ 2 = O (n 2-+3 logn). 



by Theorem [2j 
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Example 2 Double-exponential distribution Let q(x\0) be the pdf of the double-exponential 
distribution 

1 [g— 9] 

where a > is known. Then q(co) = 2/(1 + (wa) 2 ), and using properties of Fourier transform, 
we obtain 

/oo 
ip m j(t)sign(x - t) exp(-|rr - t\/a)dt. 
-oo 

Note that for any oj, one has \q{— oj)\ < a. Therefore, 

fOO f'OO 

7 2 (m)x/ \0(u;)\ 2 dw = 1, \U m (x)\< \tp(u)\duxl, 



so that condition (|4,6p holds. Consequently, a = 0, 2 m ° ~ n 2r + 1 , and 

E |*fn(y) - t(y)\ 2 = O (nT^Ti \ognj . 

by Theorem [2j 

5.2 One-Parameter exponential Family 

Let the sampling distribution belong to a one-parameter exponential family, i.e. 

q(x\0) = h(6)f(x)e- xae , xeX, 6eQ, 
where h(9) > and f(x) > 0. Then, equation (|2.10p is of the form 

roc 

f(x)e~ xa0 u mtj (x)dx = / 6f(x)e~ xa9 ^ m j(x)dx. 



Integrating the right hand side by parts and solving for u m we derive 



af(x) dx 

Example 3 Weibull Distribution If is the pdf of the Weibull distribution 

q(x\9) = aBx a - x e- xa \ x > 0, 9 > 0, a > 1. 
In this case, /(x) = and = ce# and, according to (I5.5p . u m ^(x) is of the form 



= — — i • (5-6) 



Therefore, for any j = 0, 



( (D (m)] 2 = ga^-a /"^Iz + jr^^)] 2 ^ 



M 
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so that 



2 2am a~ 2 (M 2 - M X )\M 2 + j\-( 2a ~V < [^(m)} 2 < 2 2am a~ 2 {M 2 - M X )\M X + j\-( 2a ~ 2 \ (5.7) 

Let the value of y be such that c% < y < c 2 for some < c\ < c 2 < oo. Then, it is easy to 
show that if j G Km,yi then j >c 2 m . Since the set K m ^ y has a finite number of terms, it follows 
from (J5IFD that 7 2 (m) x 2 2m . 

Finally, it remains to verify whether the condition (|4.6p of Theorem [2] holds. Indeed, it 
follows from (15.6j) that for j £ K m>y , one has 



sup \ u m j(x)\ < sup 



a[Mi+j] 



a-l 



< C2 m / 2 7 (m), (5.8) 



so Theorem [2] can be applied. Hence, under assumptions of Lemma Q3 one has a = 2, 2 m ° 
n^+a , and, E Um(y) - t(y)\ = O [n logn , by TheoremEl 



Example 4 Gamma Distribution. Let the pdf of the Gamma distribution be given by 

q( x \0) = -—0 a x a - l e- x6 ', x>0, 9 > 0, a > 1. (5.9) 
T(a) 

Note that the family of densities (|5.9p is also a scale parameter family. 
Then, by formula (|5.5p . u m j(x) is of the form 

«mj(a?) = (« " l)x~ 1 2 m / 2 ^(2 m x - j) + 2 3m /y (2 m x - j). 

If y is such that c\ < y < c 2 for some < c\ < c 2 < oo, then, by calculations similar to 
Example El obtain that a = 2, 2 m ° ~ n^+s, Theorem [5] holds, so that E ^(y) - = 
O fn~ 2 <-+3 logn | . 



5.3 Scale parameter family 

If g(x|#) is a scale parameter family, g(x|#) = (§), it is difficult to pinpoint a general rule for 
finding u m j(x), however, as it follows from Example 0] many particular cases can be treated. 
Below, we consider one more example. 

Example 5 Uniform Distribution Let q(x \ 9) be given by 

q(x\9) = 9~ 1 I(0<x <9), a < 9 < b. 
Then, equation (|2.10p is of the form 

1 u m j(x)dx = / ip m ,j(x)dx (5.10) 
Jo 



o 



Taking derivatives with respect to 9 of both sides of (|5.10p and replacing 9 by x, we derive 

u mJ (x) = 2~ m ' 2 / <p(z)dz + ^2 m / 2 ^(2 m x - j). (5.11) 
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Since a < 9 < b, then also a < x < b, and it is easy to check that 

rb rb / r2 m x—j \ 2 

/ x 2 2 m <p 2 (2 m x - j)dx x 1, / 2~ m / 2 / <p(z)dz dx = 0(2~ m ), 

Ja Ja V J M\ J 

as m — > oo. Then, j(m) x 1, a = and condition (|4.6p holds. Therefore, 2 m ° ~ n 2r + 1 and, by 
Theorem [21 E |t^(y) - t(y)| 2 = (vT^ 1 log raj . 



6 Discussion 



In the present paper we suggest an adaptive wavelet-based method of EB estimation. The 
method is based on approximating Bayes estimator t(y) corresponding to observation y as a 
whole using finitely supported wavelet family. The wavelet estimator is used in a rather non- 
ortodox way: t{y) is estimated locally using only a linear scaling part of the expansion at the 
resolution level m where coefficients are recovered by solving a system of linear equations. 

The advantage of the method lies in its flexibility. The technique works for a variery of 
families of conditional distributions. Computationally, it leads to solution of a finite system 
of linear equation which, due to decorrelation property of wavelets, is sparse and well condi- 
tioned. The size of the system depends on a size and regularity of the wavelet which is used for 
representation of the EB estimator t(y). 

A non-adaptive version of the method was introduced in Pensky and Alotaibi (2005). 
However, since no mechanism for choosing the resolution level m of the expansion was suggested, 
the Pensky and Alotaibi (2005) paper remained of a theoretical interest only. In the present 
paper, we use Lepskii method for choosing an optimal resolution level m and show that the 
resulting EB estimator remains nearly asymptotically optimal (within a logarithmic factor of 
the number of observations n). 

Finally, we should comment that, although the choice of a wavelet basis for representation 
of t(y) is convenient, it is not unique. Indeed, one can use a local polynomial or a kernel estimator 
for representation of t(y). In this case, the challenge of finding support of the estimator for the 
local polynomials or bandwidth for a kernel estimator can be addressed by Lepskii method in a 
similar manner. However, the disadvantage of abandoning wavelets will be that the system of 
equations will cease to be sparse and well-posed. 



7 Proofs 

Proofs of Lemma [2] is based on the following lemmas. 

Lemma 4 Let B, c, B and c be defined in \2. 6\) . \2.9\) . \2. 7p and h2.11\) , respectively, and let 
M be the size of the vector c. If 2 m < n(logra) -2 , then, for any r > 0, 



B — B 



> M 2 T 2 2 m n" 1 logn } < 2M'-n " 1 ^ ' 
If, in addition, |^.6| ) holds, then, for any t > 0, 

IP (\\c- c\\ 2 > Mr 2 7^n _1 logn) < 2Mn"W^, 



(7.1) 



(7.2) 
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Proof of Lemma |4] The proof is based on application of Bernstein inequality 



n 



t=i 



> z I < 2 exp 



nz 



2{a 2 + \\Y\\ 00 z/3)J ' 



where Yj, t = 1, n, are i.i. d. with EY t = 0, EY t 2 = a 2 and ||Y t || < HY]^ < oo. 

Recall that bj tk defined by (12. 9p are the unbiased estimators of bj tk defined in <\2M . Denote 

n 

m = <£mj{xt)Vmk{xt) - b j>k , so that bj )k - b j>k = rr x n t , where n t are i.i.d. with E (r] t ) = 



lxll .,lL2 m Also, ||r ?t || 00 <2 m + 111 - 2 



and a 2 B = E {rj 2 ) <2\\p\ 

Applying Bernstein inequality with z = r2"2"-\/log n/n for every k £ K m ^ y and recalling 
that 2 m log n/n — > as n — > oo, we obtain 



> r2 2 1^/log n/n) < 2 exp 



< 2 exp 



-r 2 log n 



4 ||</3 



OO \ llf I I oo 

r 2 log n \ 



+ 2f 1^6™ 



IVlloo HP 



OO I \ f I I oo , 



Since matrix B has M 2 components, (|7.ip is valid. 



Now, in order to prove (|7.2p . recall that 6j given by (|2.1ip is an unbiased estimator of Cj, 
so that, for any k, variables ^ = u mk (xt) — c k , t = 1, • • • , n, are i.i.d. with E^ = and 

/oo 
U 2 mjk (x)p(x)dx<\\ P \\ ool 2 m . 
-oo 

In addition, H^tH^ < 2||u m) j.|| < 27 m 2~. Thus, applying Bernstein inequality with z = 
n~ 1 / 2 T7 m v / log n ) we obtain 



c k -c k \>n 1/2 T7 m y / logn) < 2exp 



-t 2 log n 



4 2(|| P || 00 + 2^g^; 

To complete the proof, note that vector c has M components. 



< 2 exp 



-t logn 



E 



B — B 



0[n~ l 2 ml ), I = 1,2, A, 



Lemma 5 Let B, c, B andc be defined in \2. 6\) . t2.9\) , (2.1) and \2. respectively. Ifn — > oo, 
2 m /n — > 0, one has 

(7.3) 
(7.4) 

(7.5) 



E\\c-c\\ 2 = 0{n- 1 1 2 m ) 



E lie — c\f = Oln 



7 ( 2 )(m) 



+ n 2 7m / 
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and 



+ n 



7 ( 4 V) 

(n- 5 ||7W(m) 4 7 ( 2 )( m ) 



7 (2) (m) 

2 



+ n 



-4 



7 (1) M 



E||c-c|| 8 = 0(ji~ 7 
+ o(n 

If, in addition, |^.6p holds, then, as n — > oo, 

E||c- c|| 2j ' = O (V^"^'" 1 ^ ,j = 1,2,4. 



7 (3) (m) 



(7.6) 



(7.7) 



- bj.fe = " lS ^2 r lt where r/ t = (p m:k (X t )(p m ,j(Xt), 
t=i 

t = 1, • • • , n. Hence, taking the second moment we obtain 



Proof of Lemma [5j Recall that L 



E 



5-5 



1 n,r2 



OO II r MOO 



O (rT x 2 m ) 



For Z = 2, we apply Jensen's inequality 



E 



O (n~ A [nE[r]f] + n(n - l)E 2 [r ? 2 ]]) = O (n _2 E[r/f 



Since E[r]f] < 2 2m ||p|| 00 ||<^|| 2 x;) and matrix B is of finite dimension M, (j7.3|) is valid for 1 = 2. In 
a similar manner we can show that (17.31) holds for I = 4. 



In order to prove (|7.4p ~ (l7,6p . recall that Ck~Ck = n 1 t; t where £j = u m j(X t ), t = 1, •, 



n. 



Thus, 



t=i 



IE||c — c|T < Mn~ L E[£(]<Mn 



3 7m = O (n X 7m) 



which proves ()7.4p . Now, to prove (|7.5|) . observe that 



E 



n 



t=l . 



0(n- 3 E[£f]+n- 2 E 2 [£ 2 ]) =0 



n 



+ n 



7 ( 1 )(m; 



and note that vector c is of finite dimension M . The proof of (|7.6p can be carried out in a similar 
manner. 

Now, let us check validity of (|7.7p when condition (|4.6p holds. Observe that, under condi- 
tion (|4.6p . for j = 1, 2, .... one has 



\\^(m)f<C^2^h 2 ni 
Plugging CHI) into ([73]) -(EU), obtain (17??]) . 

Lemma 6 Let n B = jw : ||B - B|| > 0.5 H-B^ir 1 } . Then, 

\\a S ~ a|| < P - clip- 1 !! + ||c|| [(5||5 -1 || 2 + 2||5- 1 || 2 || J B - B\\ + 2o" 1 I(^ B ) 



(7.8) 



+ \\c — c 



£||B _i |r + 2||5- 1 f||B - B|| + 2<T i ]I(ft B 



(7.9) 
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Proof of Lemma [6l Since a = B 1 c and as = (B + 51) c, by the properties of the 
norm, we obtain 



\a s -a\\ < WB-HU-cW + \\B7~ 1 -5 _1 ||||c|| + WB7 1 - B' 1 



W^-B-H < WB7 1 -B7 1 \\ + II B7 1 - B~ x \ 



c — c 



(7.10) 
(7.11) 



Now, since B7 1 - B7 l = ^(B - B)B7 1 , \\B7 l \\ < 5' 1 , {{B^W < 5' 1 and {{B^W < WB' 1 ] 
for the first part of the right hand side in (17. lip , we obtain 



\b: 



B 



- 1 " < 2\\B- 1 \\ 2 \\B-B\\I(n B ) + \\B7 1 -B7 1 \\I(n B ) 
< 2||£- 1 || 2 ||5 - B\\ +25- 1 I(Q B )- 



(7.12) 



For the second part of (|7.1ip we derive \\Bg — B 1 \\ < S\\B 1 || 2 . Finally, combining (|7.10p . 
([7TTT1) and (I7TT21) . we derive (f7T31) . 

Proof of Lemma [2l Recall that, since the index set K m> y is finite, by (13. 4p . one has 

2 



E 



(y) 



0(2 m EH^ -a|| 2 ) 



(7.13) 



In order to find an asymptotic upper bound for E||a,5 — a|| 2 , square the right-hand side in (|7.9p 
and find expectation in a view of Lemmas H] and [5j Taking into account that it follows from 
(ET51) and (J3l| that, asm->oo, one has ||c|| = 0(2~ m / 2 ) and B^ 1 = (p(y)y l I + 0(2" m ) where 
p(y) > is a non-asymptotic value, so that, ||-B _1 || = 0(1) and ||-B _1 || _1 = 0(1) asm-f oo, 
obtain 



Ellaj — a| 



O E||c - c|| 2 + 2~ m [2 m n" 1 + E||B - B\\ 2 + 2~ m nP(n B )] 



+ 2 m n~ 1 E||c - c|| 2 + y/E\\c-c\\*\/E\\B - £|| 4 + 2- m n V / E||c - c|| 4 v /p (^s) 



Applying (I7.ip with r = (2\\B x \\ l MyJ\ogn) 1 2 m / 2 v / n ->■ oo, obtain that, as n — > oo, 

P(J2 B ) = 0(n' h ) forany/i>0, (7.14) 

i.e., P(fi B ) tends to zero faster than any negative power of n. Then, result of Lemma [2] fol- 
lows directly from Lemma[5]and the fact that 2 m n~ 1 — > and ||7^( m )l| 2 2 2m = o(n 3 ) as n — > oo. 



Proof of Theorem [TJ. Validity of Theorem Q] follows directly from Lemmas Q] and [2j 



Lemma 7 Let 5 2 ~ n 1 2 m and assumptions &3.15]) and h4-°V ZioW. Then, under assumptions 
of LemmaUl as n — >■ oo, 

E |? m (y) - t(y)| 4 = O (n- 2 2 2m ( 7 2 , + l) 2 + 2- 4mr ) . (7.15) 
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Proof of Lemma [7J 



E\t m (y)-t(y)\ < 4E\t m (y)-t m (y)\ + 4\t m (y) - t(y)\\ 



(7.16) 



By Lemma [Q as m, n — > oo, one has \t m (y) - t(y)\ A = o(2" 4mr ). For the first term in ||71Bj> . 
note that 

n\tm(y) - tm(y)\ 4 = 0(2 2m E\\a s - a|| 4 ). 

Upper bounds for E||a,5 — o|| 4 can be derived using Lemma [6j Since, as m,n -> oo, ||c|| = 
0{2r m l 2 ) and ||^— 1 || = 0(1), it follows from ([73]) and (JETJ) that 

E||a 5 -a|| 4 = OiEWc-cW^+O^-^^ + EWB-Bf + d^PiflB) 

+ O (Ve||c - c|| 4 + y/E\\c - c\\ 8 ^E\\B - B\\ 8 + 5~ A y/E\\c - c\\ 8 ^/PJTl^J 
= 0(n-\ 1 i + l))=0(n-\ 1 i + lf), 
which completes the proof. 



Proof of Lemma EC Denote R 2 

(y) -t mQ {y)\ > XRmn) < 

+ 



I r-1 II 2 I II r-1 1 1 2 



1 2 



SrriQ I 



p 2 mn and observe that 



m{y) - t m (y)\ + \t m (y) - t(y)\ > 0.5XR mn ) 

mo(y) - tm {y)\ + \tm (y) - t{y)\ > 0.5 XRmn)- 

Since m > too and R mn is an increasing function of m, one has \t m (y) — t(y)\ = o(2~ mr ) as 
to — > oo and R m n > Rm n- Therefore, it is sufficient to show that 

F (\t m (y) - t m (y)\ > 0.5XR mn - o(2~ mr )) = 0(n~ 2 ) 

for any m > mo- Taking into account that \t m (y)—t m (y)\ < 2 m / 2 C ip \\a m — a\\ and 2~ mr / R mn — > 
as m, n — > oo, it is sufficient to show that 



dm. ft 



>2- m / 2 (X-l)R mn /(2C^ 



0(n 



-2\ 



n —> oo. 



(7.17) 



Recall that B s 1 - B 1 = B s l (B - Bs)B l , so that, for any 6 > 0, one has 



>-i 



B7 1 - B-'\\ < WB^f \\B - B\\ + 5 m ) + 2|| J B7 1 || 2 || J B- 1 



\B-B\\ 2 + 5 2 



and, also, 



| a<5 — all < ||-Br 1 ||||c — c|| + 



\Bf-B^\ 



Consequently, probability in (|7,17p can be partition into three terms: 

> 2- m ' 2 (X - l)R mn /(2C^)) < Pi + P 2 + P 3 



dm. CI 



(7.18) 
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where 



P ( \\Bg L \\\\c- c\\ > 



Q l5 mn (A — 1) 

2™/ 2 2C v 



|c||[||5 - 511 + S m ] > WTT^VI^(A - 1) 



P \\B-^\\B -Bf + 51] > Wl+«n^ 

\ 4^/77.0^11011 



and at, a 2 and «3 are positive constants such that at + «2 + 03 = 1- 

Applying (|7.2[) and taking into account that \\Bs\\ — 2\\p\\oo, obtain 



Px < P f ||c- c|| 2 > ai ^^fV^ ^-) < 2Mn-" (7.19) 
\ 4 IMI°c VnC^ J 

where n = (128MC 2 llp^M)" 1 a\ (A-l) 2 . Recalling that ||c|| < 2M||^|| 00 2- m / 2 , using formula 
(j7.1j) and taking into account that 1 — 4M||\I r || 00 C ¥ ,/(a2(A — 1) > 1 — Vt for any small positive 
constant v\ as n — > 00, we derive 

P < f(\\B B\\> - 1)2-/ 2 yiT^Vbg^ 2W2 \ 

2 " y 11 " 4M||*|| 0O c v v^ J 

< P ( ||5 - B\\ > M2m/ ^ M^lVIH^^ < 2M , n -r 2 (7 . 20) 



where r 2 = (USM^jnlMloMoc)- 1 aj(l - *i)(l - A) 2 . 

In order to find an upper bound for P3, recall that H-B^H < 2M/p(y) and ||c|| < 
2M||^/|| 00 2 _m/ ' 2 . Also, note that p(y) > (log n) -1 / 2 , for any fixed y, as n — > 00. Therefore, apply- 
ing (|7.ip and taking into account that, due to (|4.2p and m < m n , one has (2 _m ' /2 y / n — 1) / log n > 
1 — z/2 for any small positive constant vi as n — > 00, derive 

p, , p _ fl|| a > «s(A - 1) y/l + ^y^n _ ^ A 



/-2 

< P ( 115 - 5|| 2 < ' 



2^6^115-! || ||c|| 

o 2 m M 2 logn a 3 (A-l) \ ? „ , . 

< — wsmtm < 2M 2 n~ T3 7.21 

where r 3 = (128ikT 3 C^ || ^ || oo || c^|| ^ ||^||^ )- x a 3 (A - 1)(1 - m). 

Now, in order to complete the proof, combine (|7.18|) - (17.21H and choose a«, i = 1,2,3, 
such that Tj > 2 for z = 1,2, 3, and Pi + P2 + P3 takes minimal value. 



Proof of Theorem [2J. First, let us show that E 



| J B7 1 || 2 + || 57 1 || 2 

1 omll ' H omoll 



O(l) as m, n 



00, so that asymptotic relation (|4.9p holds. Indeed, for mi <m< mo and any fixed 7/, one has 

H-^mll - W^Sm ~ BgmW + H^SmH 

- 2 ll- B 7ml| 2 H- B 5"i - 5 5m || + 25~ 1 I(O m ) + H5" 1 !! 
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where O m is defined in Lemma El Then, 

E ll4ml| 4 = " B 5mt + C P (^m) + HOI 4 ) = O(l), 

so that both and (|4.10p are valid. 

In order to find an upper bound for A2, note that by Lemmas [7] and El one has 

A 2 = E[\t rn (y)-t(y)\ 2 I(m>m )} 

m„ 1 / 

< Y, \f^\t^(y)-t(y)\ 4 ^n\Uy)-t mo (y)\ 2 >X 2 pl 

i=mo+l 

so that A2 = 0(n _1 ) = o (Pm on ) as n — > 00. 
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